LTLf /LDLf Non-Markovian Rewards
نویسندگان
چکیده
In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees.
منابع مشابه
Non-Markovian Rewards Expressed in LTL: Guiding Search Via Reward Shaping
We propose an approach to solving Markov Decision Processes with non-Markovian rewards specified in Linear Temporal Logic interpreted over finite traces (LTLf ). Our approach integrates automata representations of LTLf formulae into compiled MDPs that can be solved by off-the-shelf MDP planners, exploiting reward shaping to help guide search. Experiments with state-of-the-art UCT-based MDP plan...
متن کاملLTLf and LDLf Monitoring: A Technical Report
Runtime monitoring is one of the central tasks to provide operational decision support to running business processes, and check on-the-fly whether they comply with constraints and rules. We study runtime monitoring of properties expressed in LTL on finite traces (LTLf ) and in its extension LDLf . LDLf is a powerful logic that captures all monadic second order logic on finite traces, which is o...
متن کاملLTLf and LDLf Synthesis under Partial Observability
In this paper, we study synthesis under partial observability for logical specifications over finite traces expressed in LTLf /LDLf . This form of synthesis can be seen as a generalization of planning under partial observability in nondeterministic domains, which is known to be 2EXPTIMEcomplete. We start by showing that the usual “belief-state construction” used in planning under partial observ...
متن کاملLinear Temporal Logic and Linear Dynamic Logic on Finite Traces
In this paper we look into the assumption of interpreting LTL over finite traces. In particular we show that LTLf , i.e., LTL under this assumption, is less expressive than what might appear at first sight, and that at essentially no computational cost one can make a significant increase in expressiveness while maintaining the same intuitiveness of LTLf interpreted over finite traces. Indeed, w...
متن کاملMonitoring Business Metaconstraints Based on LTL & LDL for Finite Traces
Runtime monitoring is one of the central tasks to provide operational decision support to running business processes, and check on-the-fly whether they comply with constraints and rules. We study runtime monitoring of properties expressed in LTL on finite traces (LTLf ) and its extension LDLf . LDLf is a powerful logic that captures all monadic second order logic on finite traces, which is obta...
متن کامل